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Abstract: 

This  presentation  will  update  the  HPEC  community  on  the  latest  status  of  the  standard  Data 
Reorganization  Interface  (DRI).  DRI  is  a  software  interface  for  performing  data-parallel 
distribution  and  reorganization  operations  (e.g.,  transpose,  reshape)  that  are  frequently  required 
in  scalable  HPEC  applications.  DRI  provides  increased  ease  of  use  compared  to  point-to-point 
middleware  by  providing  abstractions  for  multi-dimensional  datasets,  partitioning  and 
distribution  methods  (e.g.,  block,  block-cyclic,  overlapped  elements),  and  a  high-level  interface 
that  frees  applications  from  having  to  orchestrate  the  multitude  of  individual  transfers  required  in 
a  single  data  reorganization.  A  planned  transfer  approach  in  DRI  enables  high  performance  data 
transfers,  and  its  multi-buffering  semantics  enable  (with  hardware  support)  time  overlap  of  an 
application’s  communication  and  computation  operations.  DRI  is  designed  to  enhance  existing 
standard  and  proprietary  middleware  by  adding  a  standard,  easy  to  use  interface  without 
compromising  high  performance. 

The  DRI- 1.0  API  was  ratified  and  published  in  September  2002  by  the  Data  Reorganization 
Forum,  and  was  announced  at  the  HPEC  2002  workshop.  DRI-related  activities  since  that 
announcement  will  be  discussed  in  this  presentation,  including  current  vendor  implementation 
status,  a  summary  of  results  from  the  first  use  of  DRI  in  a  realistic  application  demonstration 
(SAR  image  formation),  and  candidate  features  that  could  be  added  to  an  enhanced  DRI 
standard.  The  DRI- 1.0  document  can  be  accessed  on  the  World  Wide  Web  at  URL 
http://www.data-re.org. 
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Status  update  for  the  DRI-1.0  standard 
since  Sep.  2002  publication 

»  DRI  Overview. 


Highlights  of  First  DRI  Demonstration. 

>  Common  Imagery  Processor  (Brian  Sroka,  MITRE). 


•  Vendor  Status. 

>  Mercury  Computer  Systems,  Inc. 
t  MPI  Software  Technology,  Inc.  (Anthony  Skjellum). 
I  SKY  Computers,  Inc.  (Stephen  Paavola). 
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What  is  DRI? 


Standard  API  that  complements 
existing  communication  middleware 

Partition  for  data-parallel  processing 

>  Divide  multi-dimensional  dataset  across  processes 

>  Whole,  block,  block-cyclic  partitioning 

>  Overlapped  data  elements  in  partitioning 

>  Process  group  topology  specification 


Redistribute  data  to  next  processing  stage 

)  Multi-point  data  transfer  with  single  function  call 
)  Multi-buffered  to  enable  communication  /  computation  overlap 
I  Planned  transfers  for  higher  performance 
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m  m  Computer  Systems,  Inc. 

Mercury 


First  DRI-based 
Demonstration 


Common  Imagery  Processor  (CIP) 

Conducted  by  Brian  Sroka  of  The  MITRE  Corporation 


CIP  and  APG-73  Background 


CIP 


The  primary  sensor  processing 
element  of  the  Common  Imagery 
Ground/Surface  System  (CIGSS) 

Processes  imagery  data  into 
exploitable  image,  outputs  to  other 
CIGSS  elements 

A  hardware  independent  software 
architecture  supporting  multi-sensor 
processing  capabilities 

Prime  Contractor:  Northrop 
Grumman,  Electronic  Sensor  Systems 
Sector 

Enhancements  directed  by  CIP  Cross- 
Service  IPT,  Wright  Patterson  AFB 


APG-73 

SAR  component  of  F/A-18  Advanced 
Tactical  Airborne  Reconnaissance 
System  (ATARS) 

Imagery  from  airborne  platforms 
sent  to  TEG  via  Common  Data  Link 


Source:  “HPEC-SI  Demonstration:  Common  Imagery  Processor  (CIP)  Demonstration  -  APG73  Image  Formation”,  Brian  Sroka,  The  MITRE  Corporation,  HPEC  2002 


MITRE 
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APG-73  Data  Reorganization  (1) 


Range 


▼ 


Block  data  distribution 


Pq  Pi  Pn-i  Pn 


Cross  Range  - ► 


Source 


•  No  overlap 

•  Blocks  of  cross-range 

•  Full  range 


Destination 


•  No  overlap 

•  Full  cross-range 

•  Mod-16  length  blocks  of  range 


Source:  “CIP  APG-73  Demonstration:  Lessons  Learned”,  Brian  Sroka,  The  MITRE  Corporation,  March  2003  HPEC-SI  meeting 
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MITRE 


APG-73  Data  Reorganization  (2) 


Range 


Block  data  distribution 


i  ‘  - 1 


Source 


•  No  overlap 

•  Full  cross-range 

•  Mod-16  length  blocks  of  range  cells 


Destination 


•  7  points  right  overlap 

•  Full  cross-range 

•  Blocked  portion  of  range  cells 


Cross-Range 


► 


Source:  “CIP  APG-73  Demonstration:  Lessons  Learned”,  Brian  Sroka,  The  MITRE  Corporation,  March  2003  HPEC-SI  meeting 
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MITRE 


DRI  Use  in 


DRI  Implementations  Used 


Application 
MITRE  DRI 
MPI 


Application 

Application 

Mercury 

SKY 

PAS/DRI 

MPICH/DRI 

Demonstration  Demonstrations  underway 
completed 


*  MPI/Pro  (MSTI)  and  MPICH  demonstrated 
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APG-73  SAR 


Simple  transition  to  DRI 

•  #pragma  splits  loop  over 
global  data  among  threads 

•  DRI:  loop  over  local  data 

_f  or _ 

Range  compression 
Inverse  weighting 


DRI-1 :  Cornerturn 


_ for _ 

Azimuth  compression 
Inverse  weighting 


DRI-2:  Overlap  exchange 


_ for _ 

Side-lobe  clutter  removal 
Amplitude  detection 
Image  data  compression 


Source:  “HPEC-SI  Demonstration:  Common  Imagery  Processor  (CIP)  Demonstration  - 
APG73  Image  Formation”,  Brian  Sroka,  The  MITRE  Corporation,  HPEC  2002  workshop 


Portability:  SlOC  Comparison 


35000 


I  5000 


I  OOOO 


^  ^<5^ 


Sequential 


VSIPL  Shared  Memory  VSIPL  DRI  VSIPL 


•  5%  SLOC  increase  for  DRI  includes  code  for: 

I  2  scatter  /  gather  reorgs 

)  3  cornerturn  data  reorg  cases  y  1  for  interleaved  complex  + 

)  3  overlap  exchange  data  reorg  cases  -J  2  for  split  comP|ex  data  format 
I  managing  interoperation  between  DRI  and  VSIPL  libraries 


Using  DRI  requires  much  less  source  code  than 
manual  distributed-memory  implementation 


Source:  “HPEC-SI  Demonstration:  Common  Imagery  Processor  (CIP)  Demonstration  -  APG73  Image  Formation”,  Brian  Sroka,  The  MITRE  Corporation,  HPEC  2002 


workshop  MITRE 


CIP  APG-73  DRI  Conclusions 


Applying  DRI  to  operational  software  does  not  greatly  affect 
software  lines  of  code 

DRI  greatly  reduces  complexity  of  developing  portable 
distributed-memory  software  (shared-memory  transition  easy) 

Communication  code  in  DRI  estimated  6x  smaller  SLOCs  than 
if  implemented  with  MPI  manually 


•  No  code  changed  to  retarget  application  (MITRE  DRI  on  MPI) 


•  Features  missing  from  DRI: 

>  Split  complex 

>  Dynamic  (changing)  distributions 
)  Round-robin  distributions 

>  Piecemeal  data  production  /  consumption 
I  Non-CPU  endpoints 


Future  needs 


Source:  “HPEC-SI  Demonstration:  Common  Imagery  Processor  (CIP)  Demonstration  -  APG73  Image  Formation”,  Brian  Sroka,  The  MITRE  Corporation,  HPEC  2002 
Source:  “High  Performance  Embedded  Computing  Software  Initiative  (HPEC-SI)”,  Dr.  Jeremy  Kepner,  MIT  Lincoln  Laboratory,  June  2003  HPEC-SI  meeting 


MITRE 
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m  m  Computer  Systems,  Inc. 

Mercury 


Vendor  DRI  Status 

Mercury  Computer  Systems,  Inc. 

MPI  Software  Technology,  Inc. 

SKY  Computers,  Inc. 


The  Ultimate  Performance  Machine 


•  Commercially  available  in  PAS-4.0.0  (Jul-03) 

ft  Parallel  Acceleration  System  (PAS)  middleware  product 
ft  DRI  interface  to  existing  PAS  features 

•  The  vast  majority  of  DRI-1 .0  is  supported 

*  Not  yet  supported:  block-cyclic,  toroidal,  some  replication 

•  Additional  PAS  features  compatible  with  DRI 

ft  Optional:  applications  can  use  PAS  and  DRI  APIs  together 

•  Applications  can  use  MPI  &  PAS/DRI 

ft  Example:  independent  use  of  PAS/DRI  and  MPI  libraries  by 
the  same  application  is  possible  (libraries  not  integrated) 
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Hybrid  use  of  PAS  and  DRI  APIs 

PAS  communication  features: 

I  User-driven  buffering  &  synchronization 
I  Dynamically  changing  transfer  attributes 
I  Dynamic  process  sets 
I  I/O  or  memory  device  integration 
I  Transfer  only  a  Region  of  Interest  (ROI) 


Standard 

DRI_Distribution 

Object 

Standard 

DRI_Blockinfo 

Object 


Built  on  Existing  PAS  Performance 


Mercury  Computer  Systems 
1 K  X  1 K  Complex  Matrix  T ranspose 
Standard  DRI  Overhead  %  Relative  to  PAS 
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Number  of  Processors 


DRI  Adds  No 
Significant 
Overhead 

DRI  Achieves  PAS 
Performance! 
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MPI  Software  Technology,  Inc. 


MPI  Software  Technology  has  released 
its  ChaMPIon/Pro  (MPI-2.1  product)  this 
spring 

Work  now  going  on  to  provide  DRI  “in 
MPI  clothing”  as  add-on  to 
ChaMPIon/Pro 

Confirmed  targets  are  as  follows: 

t  Linux  clusters  with  TCP/IP,  Myrinet,  InfiniBand 
t  Mercury  RACE/RapidIO  Multicomputers 

Access  to  early  adopters:  1Q04 


•  More  info  available  from: 
tonv@mpi-softtech.com 

(Tony  Skjellum) 


Software  ip 
Technology!^ 


We  take  the  mess  out  of  message  passing!'1 
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SKYCO  M  PUTERS 

A  SUBSIDIARY  OF  ANALOGIC  CORPORATION 


SKY  Computers,  Inc J  (1/2) 


Initial  Implementation 


Experimental  version  implemented  for 
SKYchannel 

Integrated  with  MPI 

Achieving  excellent  performance  for 
system  sizes  at  least  through  128 
processors 
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SKYCO  M  PUTERS 

A  SUBSIDIARY  OF  ANALOGIC  CORPORATION 


Sky  Computers!  Inc.  (2/2) 


Abstract 


Presentation 


SKY’s  Plans 


Back  to  Agenda 


Next  Presentation 


Fully  supported  implementation  with 
SMARTpac 

Part  of  SKY’s  plans  for  standards 
compliance 

Included  with  MPI  library 
Optimized  InfiniBand  performance 
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